36 research outputs found

    The SOL Genomics Network Model: Making Community Annotation Work

    Get PDF
    The concept of community annotation is a growing discipline for achieving participation of the research community in depositing up‐to‐date knowledge in biological databases.
The Solanaceae Genomics Network ("SGN":http://sgn.cornell.edu/) is a clade‐oriented database (COD) focusing on plants of the nightshade family, including tomato, potato, pepper, eggplant, and tobacco, and is one of the bioinformatics nodes of the international tomato genome sequencing project. One of our major efforts is linking Solanaceae phenotype information with the underlying genes, and subsequently the genome. As part of this goal, SGN has introduced a database for locus names and descriptors, and a database for phenotypes of natural and induced variation. These two databases have web interfaces that allow cross references, associations with tomato gene models, and in‐house curated information of sequences, literature, ontologies, gene networks, and the Solanaceae biochemical pathways database ("SolCyc":http://solcyc.sgn.cornell.edu). All of our curator tools are open for online community annotation, through specially assigned “submitter” accounts. 

Currently the community database consists of 5,548 phenotyped accessions, and 5,739 curated loci, out of which more than 300 loci where contributed or annotated by 66 active submitters, creating a database that is truly community driven.
This framework is easily adaptable for other projects working on other taxa (for example see "http://chlamybase.org":http://chlamybase.org), greatly expanding the application of this user‐friendly online annotation system. Community participation is fostered by an active outreach program that includes contacting potential submitters via emails, at meetings and conferences, and by promoting featured user submitted annotations on the SGN homepage. The source code and database schema for all SGN functionalities are freely available. Please contact SGN at "sgn‐feedback[at]sgn.cornell.edu":mailto:[email protected] for more information

    SGN Database: From QTLs to Genomes

    Get PDF
    Quantitative trait loci (QTL) analysis is used to dissect the genetic basis underlying polygenic traits. Several public databases have been storing and making QTL data available to research communities. To our knowledge, current QTL databases rely on manual curation where curators read literature and extract relevant QTL information to store in databases. Evidently, this approach is expensive in terms of expert manpower and time use and limits the type of data that can be curated. At the Solanaceae Genomics Network (SGN) ("http://sgn.cornell.edu":http://sgn.cornell.edu), we have developed a database to store raw phenotype and genotype data from QTL studies, perform, on the fly, QTL analysis using R/QTL statistical software ("http://www.rqtl.org":http://www.rqtl.org) and visualize QTLs on a genetic map. Users can identify peak, and flanking markers for QTLs of traits of interest. The QTL database is integrated with other SGN databases (eg. Marker, BACs, and Unigenes), and analysis tools such as the Comparative Map Viewer. Using the comparative map viewer, users can compare chromosome with QTL regions to genetic maps of interest from the same or different Solanaceae species. As the tomato genome sequencing advances, users can also identify corresponding BAC sequences or locations on the tomato physical map, which can be suggestive of candidate genes for a trait of interest.

Furthermore at SGN, images, quantitative phenotype and genotype data, publications, genetic maps generated by QTL studies are displayed and available for download. Currently, data from three F2 and two backcross population QTL studies on fruit morphology traits (18 – 46 traits per population) is available at the SGN website for viewing at population, accession, and trait levels. Traits are described using ontology terms. Phenotype data is presented in tabular and graphical formats such as frequency distributions with basic descriptive statistics. Mapping data showing location of parental alleles on individual accession genetic maps is also available.

SGN is a public database hosted at Boyce Thomson Institute, Cornell University, and funded by USDA CSREES and NSF

    Plant Metabolic Pathways in MetaCyc and SolCyc

    Get PDF
    MetaCyc is a metabolic encyclopedia of experimentally validated biochemical pathways curated from scientific literature, that spans all organisms, with an emphasis on plants and microbes. The Pathway tools is a complex curation software suite that enables curation of reactions, construction of pathways and annotation with one or more representative enzymes, that include information such as substrate specificity, kinetic properties, activators, inhibitors, cofactor requirements, genes if cloned and links to external databases. In addition curators are able to provide concise, review-level summaries and extensive literature citations. The present database release includes more than 1200 pathways from more than 1549 organisms, 7312 reactions, 5127 enzymes, 4748 genes, 7234 chemical compounds, curated from 17916 citations. The MetaCyc database is the reference database on which the pathways are predicted from annotated genomes by PathoLogic called Pathway/genome Databases (PGDB's). The Biocyc Database ("biocyc.org":http://biocyc.org) has a collection over 300 PGDB's. Each BioCyc Database describes the genome and predicted metabolic pathways of a single organism, which are then taken up by interested groups for curation. SolCyc is one such PGDB, developed for the clade oriented Solanceae Genomics Network (SGN) database. It has predicted metabolic pathway databases of significant species belonging to Solanaceae and includes Lycocyc(tomato), Solacyc (eggplant), Nicotianacyc (tobacco),Petuniacyc (Petunia), Capcyc (Capsicum) , Potatocyc (potato). An interactive webinterface has been developed for the seamless flow of information from the SGN phenotype and locus database with SolCyc. This facilitates researchers with the capacity to search for underlying metabolic pathway information of genes and phenotypes that has been curated into the SolCyc database

    solQTL: a tool for QTL analysis, visualization and linking to genomes at SGN database

    Get PDF
    BACKGROUND: A common approach to understanding the genetic basis of complex traits is through identification of associated quantitative trait loci (QTL). Fine mapping QTLs requires several generations of backcrosses and analysis of large populations, which is time-consuming and costly effort. Furthermore, as entire genomes are being sequenced and an increasing amount of genetic and expression data are being generated, a challenge remains: linking phenotypic variation to the underlying genomic variation. To identify candidate genes and understand the molecular basis underlying the phenotypic variation of traits, bioinformatic approaches are needed to exploit information such as genetic map, expression and whole genome sequence data of organisms in biological databases. DESCRIPTION: The Sol Genomics Network (SGN, http://solgenomics.net) is a primary repository for phenotypic, genetic, genomic, expression and metabolic data for the Solanaceae family and other related Asterids species and houses a variety of bioinformatics tools. SGN has implemented a new approach to QTL data organization, storage, analysis, and cross-links with other relevant data in internal and external databases. The new QTL module, solQTL, http://solgenomics.net/qtl/, employs a user-friendly web interface for uploading raw phenotype and genotype data to the database, R/QTL mapping software for on-the-fly QTL analysis and algorithms for online visualization and cross-referencing of QTLs to relevant datasets and tools such as the SGN Comparative Map Viewer and Genome Browser. Here, we describe the development of the solQTL module and demonstrate its application. CONCLUSIONS: solQTL allows Solanaceae researchers to upload raw genotype and phenotype data to SGN, perform QTL analysis and dynamically cross-link to relevant genetic, expression and genome annotations. Exploration and synthesis of the relevant data is expected to help facilitate identification of candidate genes underlying phenotypic variation and markers more closely linked to QTLs. solQTL is freely available on SGN and can be used in private or public mode

    Exploiting the diversity of tomato: the development of a phenotypically and genetically detailed germplasm collection

    Get PDF
    A collection of 163 accessions, including Solanum pimpinellifolium, Solanum lycopersicum var. cerasiforme and Solanum lycopersicum var. lycopersicum, was selected to represent the genetic and morphological variability of tomato at its centers of origin and domestication: Andean regions of Peru and Ecuador and Mesoamerica. The collection is enriched with S. lycopersicum var. cerasiforme from the Amazonian region that has not been analyzed previously nor used extensively. The collection has been morphologically characterized showing diversity for fruit, flower and vegetative traits. Their genomes were sequenced in the Varitome project and are publicly available (solgenomics.net/projects/varitome). The identified SNPs have been annotated with respect to their impact and a total number of 37,974 out of 19,364,146 SNPs have been described as high impact by the SnpEeff analysis. GWAS has shown associations for different traits, demonstrating the potential of this collection for this kind of analysis. We have not only identified known QTLs and genes, but also new regions associated with traits such as fruit color, number of flowers per inflorescence or inflorescence architecture. To speed up and facilitate the use of this information, F2 populations were constructed by crossing the whole collection with three different parents. This F2 collection is useful for testing SNPs identified by GWAs, selection sweeps or any other candidate gene. All data is available on Solanaceae Genomics Network and the accession and F2 seeds are freely available at COMAV and at TGRC genebanks. All these resources together make this collection a good candidate for genetic studies

    An ontology approach to comparative phenomics in plants

    Get PDF
    BACKGROUND: Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework. RESULTS: We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes. CONCLUSIONS: The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of model genetic organisms and can be readily applied to species with fewer genetic resources and less well-characterized genomes. In addition, these tools should enhance future efforts to explore the relationships among phenotypic similarity, gene function, and sequence similarity in plants, and to make genotype-to-phenotype predictions relevant to plant biology, crop improvement, and potentially even human health.This item is part of the UA Faculty Publications collection. For more information this item or other items in the UA Campus Repository, contact the University of Arizona Libraries at [email protected]

    Crop Ontology Governance and Stewardship Framework

    Get PDF
    A governance & stewardship framework for the Crop Ontology Project is required as this is a collaborative tool developed by a Community of Practice. Over the last 12 years of its existence, it has increased significantly in scope and use. Collecting and storing plant trait data and annotating the data with ontology terms is widely accepted by the crop science community to be critical to enable data interoperability and interexchange through tools such as the Breeding API (BrAPI). The Crop Ontology Community of Practice is organised around roles, curation principles and validation processes that require a formal description. A governance framework is defined by the various actors involved in the asset’s design, development and maintenance. It is complemented by a quality assurance process to ensure that trust levels, value creation, and sustainability objectives meet appropriate quality levels. The general principles underlying data governance are integrity, transparency, accountability and ownership, stewardship, standardization, change management and a robust data audit
    corecore